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Abstract 

Granger causality analysis is a popular method for inference on directed interactions in complex 
systems of many variables. A shortcoming of the standard framework for Granger causality is 
that it only allows for examination of interactions between single (univariate) variables within 
a system, perhaps conditioned on other variables. However, interactions do not necessarily take 
place between single variables, but may occur among groups, or "ensembles" , of variables. In 
this study we establish a principled framework for Granger causality in the context of causal 
interactions among two or more multivariate sets of variables. Building on Geweke's seminal 1982 
work, we offer new justifications for one particular form of multivariate Granger causality based 
on the generalized variances of residual errors. Taken together, our results support a comprehen- 
sive and theoretically consistent extension of Granger causality to the multivariate case. Treated 
individually, they highlight several specific advantages of the generalized variance measure, which 
we illustrate using applications in neuroscience as an example. We further show how the measure 
can be used to define "partial" Granger causality in the multivariate context and we also motivate 
reformulations of "causal density" and "Granger autonomy" . Our results are directly applicable 
to experimental data and promise to reveal new types of functional relations in complex systems, 
neural and otherwise. 
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1 Introduction 

A key challenge across many domains of science and engineering is to understand the behavior of 
complex systems in terms of dynamical interactions among their component parts. A common way 
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to address this challenge is by analysis of time series data acquired simultaneously from multiple 
system components. Increasingly, such analysis aims to draw inferences about causal interactions 
among system variables [1, 2, 3], as a complement to standard assessments of undirected functional 
connectivity as revealed by coherence, correlation, and the like. 

A first step in any dynamical analysis is to identify target variables. Typically, subsequent analysis 
then assumes that functional (causal) interactions take place among these variables. However, in 
the general case it may be that explanatorily relevant causal interactions take place among groups, 
or "ensembles", of variables [4, 5]. It is important to account for this possibility for at least two 
reasons. First, identification of target variables is usually based on a priori system knowledge 
or technical constraints, which may be incomplete or arbitrary, respectively. Second, even given 
appropriate target variables, it is possible that relevant interactions may operate at multiple scales 
within a system, with larger scales involving groups of variables. Consider an example from functional 
neuroimaging. In a typical fMRI 1 study, the researcher may identify a priori several "regions-of- 
interest" (ROI) in the brain, each represented in the fMRI dataset by multiple voxels, where each 
voxel is a variable comprising a single time series reflecting changes in the underlying metabolic 
signal. Assuming that the objective of the study is to assess the causal connectivity among the 
ROIs, a standard approach is to derive a single time series for each ROI either by averaging or by 
extracting a principal component [6]; alternatively, repeated pairwise analysis can be performed on 
each pair of voxels. A more appropriate approach, however, may be to consider causal interactions 
among the multivariate groups of voxels comprising each ROI. Similar scenarios could be concocted 
in a very wide range of application areas, including economics, biology, climate science, among others. 

In this paper, we describe a principled approach to assessing causal interactions among multi- 
variate groups of variables. Our approach is based on the concept of Granger causality (G-causality) 
[7, 8], a statistical notion of causality which originated in econometrics but which has since found 
widespread application in many fields, with a particular concentration in the neurosciences [1, 9]. 
G-causality is an example of time series inference on stochastic processes and is usually implemented 
via autoregressive modeling of multivariate time series. The basic idea is simple: one variable (or 
time series) can be called "causal" to another if the ability to predict the second variable is improved 
by incorporating information about the first. More precisely, given inter-dependent variables X and 
Y, it is said that U Y G-causes X" if, in a statistically suitable manner, Y assists in predicting the 
future of X beyond the degree to which X already predicts its own future. It is straightforward 
to extend G-causality to the conditional case [5], where Y is said to G-cause X, conditional on Z, 
if Y assists in predicting the future of X beyond the degree to which X and Z together already 
predict the future of X. Importantly, conditional G-causality is orthogonal to the notion of inferring 
causality among groups of variables, which is the focus of the present paper and which we term mul- 
tivariate G-causality. In the multivariate case, the above description of G-causality is generalized to 
interactions among sets of interdependent variables X, Y, and (in the conditional multivariate case) 
Z. The generalization we propose was originally introduced in the field of econometrics by Geweke 
in 1982 [5], but has since been almost totally overlooked. Indeed a different measure has recently 
appeared [4]. In the following, we derive several justifications for preferring Geweke's measure, some 
of which we examine numerically. We go on to explore a series of implications for the analysis of 
complex systems in general, with a particular focus on applications in neuroscience. 

functional magnetic resonance imaging. 
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After laying out our conventions in Section 2, in Section 3 we introduce two alternative measures 
of multivariate G-causality. The formulations differ according to their treatment of the covariance 
matrices of residuals in the underlying autoregressive models: Geweke's measure uses the determinant 
of this matrix (the generalized variance) , while the other uses the trace (the total variance) . Section 
4 explores several advantageous properties of the determinant formulation as compared to the trace 
formulation. In brief, the determinant formulation is fully equivalent to transfer entropy [3] under 
Gaussian assumptions, is invariant under a wider range of variable transformations, is expandable 
as a sum of standard univariate G-causalities, and admits a satisfactory spectral decomposition. 
Numerically, we show that Geweke's measure is just as stable as is the alternative measure based on 
the total variance. Section 5 extends the determinant formulation to the important case of "partial" 
G-causality which provides some measure of control with respect to unmeasured latent or exogenous 
variables. Section 6 extends a previously defined measure of "causal density" [10] which reflects the 
overall dynamical complexity of causal interactions sustained by a system. In Section 7 we show how 
multivariate G-causality can enhance a measure of "autonomy" (or "self-causation") based on G- 
causality [11], and Section 8 carries the discussion towards the identification of macroscopic variables 
via the notion of causal independence. Section 9 provides a general discussion and summary of 
contributions. 



2 Notational conventions and preliminaries 

We use a mathematical vector /matrix notation in which bold type generally denotes vector quantities 
and upper-case type denotes matrices or random variables, according to context. All vectors are 
considered to be column vectors. '©' denotes vertical concatenation of vectors, so that for x = 
(xi, . . . ,x n ) J and y = (yi, . . . ,y m ) J , x®y is the vector (x±, . . . ,x n ,y 1 , . . . ,y m ) J of dimension ra + m, 
where the symbol ' T ' denotes the transpose operator. We also write |-| for the determinant and tr(-) 
for the trace of a square matrix. 

Given jointly distributed multivariate random variables (i.e. random vectors) X,Y, we denote 
by E(JT) the n x n matrix of covariances cov(Aj, Xj) and by T,(X,Y) the n x m matrix of cross- 
covariances cav(Xi,Y a ). We then use S(X \ Y) to denote the n x n matrix 

E(X | Y) = E(X) - E(X, Y) T.iYy 1 E(X, Y) J , (1) 

defined when T,(Y) is invertible. E( X \ Y) appears as the covariance matrix of the residuals of a 
linear regression of X on Y (c.f. Eq. (6) below); thus, by analogy with partial correlation [12] we 
term E(X | Y) the partial covariance 2 of X given Y. Similarly, given another jointly distributed 
variable Z, we define the partial cross- covariance 

E(X,y| Z) = E(X, Y) - E(X, Z) E(Z)" 1 E(Y", Z) T . (2) 

The following identity [13] will be useful for deriving certain properties of multivariate G-causality: 

|E(x|y)| = |E(xe^)|/|E(y)| . (3) 

2 This is to be distinguished from the conditional covariance, which will in general be a random variable, though 
later we note that for Gaussian variables the notions coincide. 
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Suppose we have a multivariate stochastic process Xt m discrete time 3 (i.e. the random variables 

Xu are jointly distributed). We use the notation x[ p ^ = X t © Xt-\ © ... © X t ~ p +i to denote X 

(p) 

itself, along with p — 1 lags, so that for each t, X\ is a random vector of dimension pn. Given the 

— (p) 

lag p, we also often use the shorthand notation X t = X^_\ for the lagged variable. 

3 Multivariate Granger causality 

G-causality analysis is concerned with the comparison of different linear regression models of data. 
Thus, let us consider the (multivariate) linear regression of one random vector X, the predictee, on 
another random vector Y, the predictor: 4 

X = A ■ Y + e , (4) 

where the nxm matrix A contains the regression coefficients and the random vector e = (ei, . . . , e n ) J 
comprises the residuals. The coefficients of this model are uniquely specified by imposing zero 
correlation between the residuals e and the regressors (predictors) Y. Via the Yule- Walker procedure 
[1, 13] one obtains 

A = E(X, Y) S(Y") _1 (5) 
and finds the covariance matrix of the residuals to be given by 

£(e) = £( X | Y) , (6) 

with Ti(X | Y) defined as in (1). 

Suppose now we have three jointly distributed, stationary multivariate stochastic processes 
Xt,Yf, Zf. Then to measure the G-causality from Y to X given Z, one wants to compare the 
following two multivariate autoregressive (MVAR) models for the processes [8]: 



X t = A ■ (x£\ © Z t ( :\) + e t , 
X t = A'.(xfX®Y}lUz^) 



(7) 



Thus the predictee variable X is regressed firstly on the previous p lags of itself plus r lags of the 
conditioning variable Z and secondly, in addition, on q lags of the predictor variable Y (in theory, 
if not in practice, p, q and r could be infinite). 6 

The standard measure of G-causality used in the literature is defined only for univariate predictor 
and predictee variables Y and X, and is given by the log of the ratio of the residual variances for 



3 While our analysis may be extended to continuous time we focus here on the discrete time case. 

4 Here and in the remainder of this paper we assume, without loss of generality, that all random vectors and random 
processes have zero mean; thus constant terms are omitted in all linear regressions. 

5 The analysis carries through for the non-stationary case, but for simplicity we assume here that all processes are 
stationary. 

6 This might be more familiar as conditional G-causality, with Z the conditioning variable. In practice it is the more 
useful form; for the non-conditional version, Z may simply be omitted. 
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the regressions (7). In our notation, 7 



In 



var(et) 
var(e£) 



/ s(x|x-®z-) \ 

Vs(^|x-er-ez-)y' ' w 

where the last equality follows from the general formula (6). By stationarity this expression does not 
depend on time t. Note that the residual variance of the first regression will always be larger than 
or equal to that of the second, so that J~y^x\z > always. As regards statistical inference, it is 
known that the corresponding maximum likelihood estimator 8 J-y^-x \z w iH have (asymptotically for 
large samples) a ^-distribution under the null hypothesis J~y->x\z = [14, 15], and a non-central 
X 2 -distribution under the alternative hypothesis J 7 y->x\z > [5, 16]. 

We now consider the case where predictee and predictor variables are no longer constrained to be 
univariate, i.e. multivariate G-causality. For a multivariate predictor, Eq. (8) above (with Y replaced 
by the bold-type Y) is a valid and consistent formula for G-causality. However, for the case of a 
multivariate predictee there is not yet a standard definition for G-causality. One possibility is to 
simply use the multivariate mean square error (i.e. total variance, or expected squared length of the 
multivariate residual), leading to 



T tr , = In 
^ Y^X Z — 111 



tr(S(e*)) 
tr(S(e{)) 



= m( tr(E ( xix-ez-)) y 

Vtr(E(X \X-QY-® Z-)) ) V ; 

We call this the trace version of multivariate G-causality (trvMVGC). As recently noted by Ladroue 
and colleagues [4] trvMVGC appears to be a natural extension of G-causality to the multivariate 
case because total variance is a common choice for a measure of goodness-of-fit or prediction error 
for a multivariate regression. Moreover, the measure is always non-negative, reduces to (8) when 
the predictee variable is univariate, and the regression matrix coefficients that render the residuals 
uncorrelated with the regressors also minimize the total variance (this is just the "ordinary least 
squares" procedure, minimizing mean square error). Nonetheless, an alternative originally proposed 
by Geweke [5] uses instead the generalized variance |S(£j)|, which quantifies the volume in which 
the residuals lie. This leads to the measure 



F Y^x\z = l n 



|S(et) 



|S(X|X-0Z-)| 

^ Wx\x-<by-®z-)\) ■ (10) 



7 Note that even though X and Y are univariate, the lagged variables X~ and Y~ will generally be multivariate (at 
least if p, q > 1); hence they are written in bold type. 

s We remark that for significance testing of G-causality it is quite common to use the appropriate _F-statistic for the 
regressions (7) rather than J-y^x \z itself [8, 17]; the quantities are in any case related by a monotonic transformation. 
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Like trvMVGC, this measure is always non- negative, reduces to (8) when the predictee variable is 
univariate, and is consistent with the autoregressive approach inasmuch as the Yule- Walker regression 
matrix coefficients minimize the generalized variance, |E(e)|, as well as the total variance, (see 
Appendix A for a proof). Geweke [5] lists a number of motivations for taking J~y^x\z as given 
in Eq. (10) as the natural extension of G-causality to the multivariate case. These include: (i) 
that the generalized variance version (10) is invariant under (linear) transformation of variables 
(see Section 4.2); and (ii) that the maximum likelihood estimator of this quantity, J~y^x \z> is 
asymptotically x 2 -distributed for large samples. In the following section we further justify this choice. 
Since we advocate the use of Geweke's measure (10) of multivariate G-causality we abbreviate this 
simply as MVGC henceforth. 

As remarked previously, the expression (10) defines conditional MVGC. Geweke [18] gives the 
following intuitively appealing expression for J~y~>x \z m terms of unconditional MVGCs: 

Fy^x \z = Fy®z^x - Fz-^x ; (11) 

that is, the extent to which Y and Z together cause X less the extent that Z on its own causes X. 
Note that this identity also holds for trvMVGC. 

4 Properties of Multivariate Granger causality 

In the following subsections we discuss some properties of MVGC and further motivate Geweke's 
definition of this measure. 

4.1 Gaussian Equivalence with Transfer Entropy 

When all variables are Gaussian distributed, the MVGC J~y^x \z 1S fully equivalent to the transfer 
entropy T V-s-x \Zi an information-theoretic notion of causality [13], with a simple factor of 2 relating 
the two quantities, 

Fy^x \z = 2Ty_>x \z ■ (12) 
Transfer entropy [3, 19] is defined by the difference in entropies 

r Y ^x\z = H(x\x- ®zr) -h(x\x~ @y~ ®zr) , (13) 

and quantifies the degree to which knowledge of the past of Y reduces uncertainty in the future of X. 
The equivalence (12) stems from the entropy of a Gaussian distribution being directly proportional 
to the logarithm of the determinant of its covariance matrix; and, furthermore, from any conditional 
entropy involving Gaussian variables being directly proportional to the logarithm of the determinant 
of the appropriate corresponding partial covariance matrix (see [13] for details). Due to the use of 
the determinant being crucial for this relationship, for trvMVGC the equivalence holds only in the 
more restricted situation when the predictee variable is univariate. 

In addition to motivating MVGC over trvMVGC, the equivalence (12) also provides a justification 
for the use of linear regression models in measuring causality. Transfer entropy is naturally sensitive 
to nonlinearities in the data, a property which is rightly seen as desirable for measures of causality and 
which has motivated the development of several nonlinear extensions to standard G-causality [20, 21]. 
However, when data are Gaussian, the two linear regressions capture all of the entropy difference that 
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defines transfer entropy, which implies that non-linear extensions to G-causality are of no additional 
utility. Indeed for two multivariate Gaussian variables X and Y, the partial covariance T<(X\Y), 
which is the same quantity as the residual covariance under linear regression, can be simply thought 
of as the conditional covariance of X given Y, because cov(Jf = y) = Ti{X\Y) for all y. Hence, 
for Gaussian data, linear regression accounts for all the dependence of the regressee on the regressor. 

To demonstrate formally that a stationary Gaussian AR process must be linear, consider a general 
stationary multivariate Gaussian process Xt satisfying 

X t = f(xt\)+e t , (14) 

where /(•) is some sufficiently well-behaved, possibly nonlinear function and the St are independent 
of Xt~ s f° r s — 1) 2, . . .. For any t then, St = Xt — f(x^\) is independent of -X^_i, so that, in 



(p) 

particular, for any value £ taken by X^_\, the conditional expectation 

(p) _ A — TP / Y. \ Y (p) 



B[e t | = = E[X t | = £j - /(*) (15) 

(p) 

does not depend on £ and nor, by stationarity, on t. But since by assumption Xt and -X"^ are 
jointly multivariate Gaussian, by a well-known result E( Xt \ Xp^-A depends linearly on and from 



(15) it follows that f(£) must be a linear function of 

4.2 Invariance under transformation of variables 

The partial covariance S(X|"K) transforms in a simple way under linear transformation of variables. 
If T and U are respective matrices for linear transformations on X and Y then we have that 

E(T-X\U-Y) =TY,(X\Y)T J . (16) 

Using this formula, and the properties of the determinant and trace operators, we can find the 
respective groups of linear transformations under which MVGC and trvMVGC are invariant. For 
MVGC, we find that the most general transformation that J~y->x\z 1S invariant under is given by 

X — > T xx ■ X , 

Y ^Ty X -X+Tyy-Y + Tyz-Z, (17) 

Z — > T zx ■ X + T zz ■ Z , 

where the matrices T xx , T yy and T zz on the diagonal are non-singular. All these symmetries are 
desirable properties for a causality measure. There ought to be invariance under redefinition of the 
individual variables within each of X, Y and Z, (i.e. under the diagonal components T xx , T yy and 
T zz of Eq. (17)), because MVGC is designed to measure causality between unified wholes rather than 
between arbitrarily defined constituent elements. The "off-diagonal" components T yx , T yz and T zx 
are also intuitive. Adding components of Z or X to the predictor Y should not change the value 
of MVGC, because MVGC is designed to measure the ability of Y at predicting X over and above 
Z and X. Similarly, adding components of X onto Z should not make a difference because the 
predictee X could already be thought of as a conditional variable before transformation. 
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trvMVGC has an invariance under a similar group of transformations but with one significant 
restriction, namely that the matrix T xx must be conformal (angle-preserving), that is T xx must 
satisfy T XX T XX J = cl for some constant c. This difference can have practical consequences. The 
broader invariance of MVGC (under all linear transformations T xx ) means that this measure, but 
not trvMVGC, is insensitive to certain common inaccuracies of data collection, namely those in 
which variables within a given set X are contaminated by contributions from other variables (see 
Discussion). To put this point another way, if one wishes to infer MVGC between hidden variables 
by analyzing MVGC between observed variables, these two quantities are actually the same if the 
relationship between hidden and observed variables is linear and can be written in the form given 
in Eq. (17). One may also wish to measure the MVGC from the independent components of the 
predictor to the independent components of the predictee. Again, the invariance properties of MVGC 
mean that one does not need to explicitly find these independent components; one can simply compute 
MVGC between observed components. These observations indicate that MVGC takes into account 
correlation between variables in a principled way. We see this explicitly in Section 4.3. 

The restriction T XX T XX J = cl for trvMVGC further implies that an uneven rescaling of the 
components of the predictee variable may change the value of J~%-^x \z- This too has practical 
implications, namely that trvMVGC but not MVGC can be affected by magnitude differences in 
the components of X, perhaps resulting from these components reflecting underlying mechanisms 
that are differently amplified or differentially accessible to the measuring equipment, a common 
situation in many neuroscience contexts (see Discussion). This sensitivity is undesirable because 
causal connectivity should be based on the information content of signals (c.f. Section 4.1), and not 
on their respective magnitudes. 

It is worth noting that for transfer entropy the symmetry group can be extended to include all 
non-singular (not necessarily linear) transformations of the predictee variable, since the entropies are 
invariant under such transformations. 9 Since G-causality is essentially a linear version of transfer 
entropy, the former should at least be invariant under the linear subgroup of transformations. 

4.3 Expansion of Multivariate Granger Causality 

MVGC is expandable as a sum of G-causalities over all combinations of univariate predictor and 
predictee variables contained within the multivariate composites. The existence of this expansion 
depends on the fact that determinants are decomposable into products, and that logarithms of 
products are decomposable into sums of logarithms. No such decomposition exists for the logarithm 
of a trace, and so there is no obvious way of expanding trvMVGC into combinations of univariate 
components. 

The expansion of MVGC is not entirely straightforward because different terms in the sum 
involve conditioning on the past and present of different subsets of variables. However each pre- 
dictor /predictee combination appears precisely once in the sum, and each term can be explained 
intuitively. The general formula may be written as 

n m 

Fy^X \Z = ^Y^Xi \Z®X®Y 1 ®...®Y a - 1 ®X°®...®X^ 1 > ( 18 ) 

i=l a=l 

9 If the predictee variable has a continuous (multivariate) distribution, we note that the Jacobian determinants in 
the standard change-of- variables formula for entropy calculation cancel out. 
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where the superscript '°' indicates conditioning on the present (in addition to the past) of the 
corresponding variables. Thus, in the term for causality from Y a to Xi one conditions on (i) the past 
of the entire multivariate conditional variable Z, (ii) the past of the entire multivariate predictee 
variable X, (iii) the past of all predictor variables Yr with (3 < a and (iv) the present of all predictee 
variables Xj with j < i. The derivation of the expansion (18) is given in Appendix B. 
For the case of a multivariate predictor and a univariate predictee we have 

Fy^X = Fy^X +J 7 Y 2 ^X\Y 1 +J r Y 3 ^X\Y 1 e,Y2-\ H T Y m -+X \Y x ®Y%®...®Y m ^ x ■ (19) 

This formula is consistent with the intuitive idea that the total degree to which the multivariate Y 
helps predict the univariate X is: the degree to which Y± predicts X, plus the degree to which Yi 
helps predict X over and above the information already present in Y±, and so on. 
For the case of a multivariate predictee and a univariate predictor we have 

Fy-*x = Fy^xj \x +J r Y ^x 2 |xex» + jr Y^x :i \x®x°®x° H l-^V-KXn \x®x°®x°®...®x°_ 1 ■ ( 20 ) 

This formula supports the intuition that the total degree to which the univariate Y helps predict 
the multivariate X is: the degree to which the past of Y helps predict the current value of X\ over 
and above the degree to which the past of the whole of X predicts the current value of X\ , plus the 
degree to which the past of Y helps predict the current value of X2 over and above the degree to 
which the past of the whole of X and the current value of X\ predicts the current value of X2, and 
so on. 

We remark on two implications of the expansion of MVGC. First, Ladroue and colleagues sug- 
gested that use of generalized residual variance for causal inference on high-dimensional data might 
suffer from problems of numerical stability. However, the expansion of MVGC into low-dimensional, 
univariate G-causalities suggests that there should be no problem (see Section 4.3.1 for numerical 
evidence of this). Second, the expansion (18) indicates that MVGC controls for, to some extent, 
the influence of unmeasured latent/exogenous variables (see also Section 5). By conditioning on the 
present of certain appropriate predictee variables for each term of the expansion, only the effects 
of each predictor on independent components of the predictees enter the equation. This property 
stems from the fact that the determinant of the residual covariance matrix reflects not just residual 
variances, but also the extent to which these residual variances are independent of each other. This 
is another advantage of the MVGC measure over trvMVGC, which does not depend on residual 
correlations. 

4.3.1 Stability of Multivariate Granger Causality 

We tested numerically our claim (Section 4.3) that MVGC should not be less stable than trvMVGC. 
We studied MVAR(l) processes whose dynamics are given by 

X t = A ■ X t _! + e t , (21) 

where X contains 8 variables, the sum of each row of A (i.e. total afferent to each element) is 0.5, all 
components in a given row of A are equal and positive, and each component of St is an independent 
Gaussian random variable of mean and variance 1. We generated 30 random "connectivity" matrices 
(or systems) At, (i = 1, . . . , 30), each with an average of 2 non-zero components per row. For each A{ 
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we obtained 10 sets of 3000 (post equilibrium) data points via Eq. (21). For each set, we computed 
the MVGC across each bipartition of the system corresponding to Ai. We then calculated, for each 
bipartition, the standard deviation of the MVGC across the 10 data sets and (excluding bipartitions 
with standard deviation less than 0.01) the corresponding coefficient of variation (CoV, standard 
deviation divided by mean). This procedure allowed us to obtain, for each Ai, a maximum CoV. 
Figure 1(a) shows that the maximum CoV is generally very small and never large, confirming the 
stability of MVGC. 

To compare the stability of MVGC with that of trvMVGC, for each Ai and for each bipartition 
we divided the CoV for MVGC by the CoV for trvMVGC. Figure 1(b) shows the distribution of 
the average of this ratio across all bipartitions. The clustering of this distribution at ~1, with no 
outliers, confirms that MVGC and trvMVGC have similar stability properties, at least in the systems 
we have simulated. 

To generalize these results we next used a genetic algorithm (GA) [22, 23] to see if we could 
find a network for which MVGC becomes unstable. The GA was initialized using a population 
composed of the 30 random systems Ai described above. We ran the GA for 130 generations. In 
each generation, we computed the fitness of each system as the maximum CoV of MVGC. Systems 
were selected to proceed to subsequent generations using stochastic rank-based selection. Mutations 
enabled the adding of new non-zero components to Ai, the removal of existing non-zero components, 
or the swapping of components, followed by renormalization of each row to sum to 0.5 again; two 
mutations were applied per system. After 130 generations (sufficient for fitness to asymptote) the 
average fitness (i.e. maximum CoV) in the population was ~0.25, and the maximum was 0.39, 
which is still a low value. For the A, that gave this highest value, we compared the CoV obtained 
using MVGC with that obtained using trvMVGC following the procedure described above. The 
average ratio (across all bipartitions) was ~1.00, (maximum value 1.12), indicating that MVGC and 
trvMVGC had similar stability properties even for systems optimized to be unstable with respect 
to MVGC. Further, we examined some Ai for which the sums of the rows differed (i.e. having 
heterogeneous afferent connectivity); these systems had similar stability properties to those described 
above. Finally, stability properties were unaffected when computations were based on 1000 (rather 
than 3000) data points. 

Taken together, these simulation results confirm that MVGC is numerically stable, and is not 
appreciably different from trvMVGC in terms of stability properties. 

4.4 Spectral decomposition 

In this section we review the spectral decomposition of G-causality [5, 1]. For simplicity we limit 
ourselves to the unconditional case, although the procedure may be readily extended to the condi- 
tional case (as described in e.g. Refs. [18, 1, 24]). We assume multivariate predictor and predictee 
variables, and show that MVGC but not trvMVGC has a satisfactory spectral decomposition. 
Consider the stationary MVAR 



v 




(22) 



We may write this as 



A(L) ■X t = e t 



(23) 
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Figure 1: Stability of MVGC. (a) Histogram of the maximum CoV of MVGC, observed over 10 trials 
of 3000 time-steps, for each of 30 different systems, as described in Section 4.3.1. (b) Histogram of 
the average ratio between the CoV of MVGC and the CoV of trvMVGC, for each of the 30 systems. 
MVGC is numerically stable (a) and is not appreciably different from trvMVGC in terms of stability 
properties (b). 



where L denotes the (single time step) lag operator, and 

A(L)^I-Y,A k L k 

k=i 



(24) 



Eq. (23) may be solved as 

X t = H{L) ■ e t , (25) 

where H{L) = A(L) . Transforming into the frequency domain via the discrete-time Fourier 
transform X(X) = Et=-oo X t e ~ M y ields A W ■ X W = e(A) (replace L by e~ iX ), so that 

X(X) = H(X) ■ e(A) , (26) 

where H(X) = A(X)~ 1 is the transfer matrix. The (power) spectral density of X is then given by 

S{X) = H{X)E(e)H*(X). (27) 

From a standard result [25], since H(L) is a square matrix lag operator with the identity matrix as 
leading term, we have 

— / m|if(A)H*(A)| dA = 0, (28) 

provided that all roots of the characteristic polynomial |^4(L)| lie outside the unit circle, which is a 
necessary condition for the existence of the stationary process (22). From (27) we may then derive 
the relation [26] 



1 

2^ 



ln|5(A)| dA = ln|E(e)| 



(29) 
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Consider now the stationary MVAR 



with coefficients matrix 



X t © Y t = A ■ ( x£\ Y^j) + e*,* e y ,t (30) 



A=(^ (31) 



A A 



and residuals covariance matrix 



E(e,©ey)==('*r ^) . (32) 



Let us split the corresponding transfer matrix H(X) as 

*W = ^>-'-(£w "Zm) <33) 



5(A) = ('f^m S a V ^h ■ 



and the spectral density as 

— ( 

K Sy X (X) Syy(X) / 

Then S xx (\) is just the spectral density of X , which from (27) is given by 

S XX (X) = H XX (X)^ XX H* X (X) + 2<Re{H xx (\)Z xy H* y (\)} + H xy (X)T, yy H* y (X) . (35) 

The idea is that we wish to decompose this expression into a part reflecting the effect of X itself and 
a part reflecting the causal influence of Y . The problem is that, due to the presence of the "cross" 
term, S XX (X) does not split cleanly into an X and a Y part. Geweke [5] addresses this issue by 
introducing the transformation 

X Y -+U- {X © Y) , (36) 

where 

\— ^yx^ xx l) ^ ^ 

Note that this transformation leaves the G-causality J-y-^x invariant (c.f. Section 4.2) and, for 
the transformed regression, we have = 0? that is, the residuals Sx-i^-y 

are uncorrelated. Thus, 

assuming the transformation (37) has been pre-applied, Eq. (35) becomes 

S XX (X) = H XX (X)T, XX H XX (X) + H xy (X)Y, yy H* y (X) , (38) 

whereby the spectral density of X splits into an "intrinsic" part and a "causal" part. The spectral 
G-causality of Y — )■ X at frequency A is now defined to be 

fy x(X) — ln( \Sxx{X) | \ (39) 

V \H XX (X)J^ XX H XX (X)\ J 

or, in terms of the untrans formed variables, 

fY X (A) = In ( IffrsMI j /^qn 

y \S X x(X) — H xy (X)T> y \ x H* y (X)\ J 
12 



with S XX (X) as in (35) and E y \ x = S ra - T, yx T, x ^'E X y. 

Geweke (Ref. [5], Theorem 2) then establishes the fundamental motivating relationship between 
frequency and time domain G-causality: 



1 

2^ 



/r^x(A) dA = T Y ->x , (41) 



provided that all roots of |^4 ra (L)| lie outside the unit circle. The proof of this relation relies 
crucially on the result (28) which, we note, involves the determinant of the transfer matrix. Thus if 
the trace, rather than the determinant, were to be used in the definition (39) for /v-^x(A) then we 
could not expect to obtain a relation corresponding to (41), since (i) the trace of the spectral density 
in Eq. (27) does not factorize, (ii) there is no trace analogue to Eq. (28), and thus (iii) no analogue 
to Eq. (29). This would seem to preclude a satisfactory spectral decomposition for the trace version 
of G-causality. Similar remarks apply to conditional G-causality in the spectral domain. 

In Ref. [4], however, it is conjectured that a trace analogue of Eq. (41) does indeed hold. To test 
this conjecture we performed the following experiment: we simulated 1000 MVAR(l) processes of 
the form 

x t e Y t = a ■ (x t -i e Y t -i) + £ x , t e e y>t , (42) 

where X has dimension 2 and Y dimension 1. Residuals £ x ti £ yt were completely uncorrelated, with 
unit variance (i.e. H{e x t ®e y t) was the 3x3 identity matrix) so that, in particular, the Geweke 
transformation (37) was unnecessary. For each trial the 3x3 coefficients matrix A was chosen at 
random with elements uniform on [— ^, and the process (42) simulated for 10 6 stationary time 
steps (the occasional unstable process was rejected). Time domain causalities Ty^x, Fy^x an d 
frequency domain causalities fy->x (A), fy-^xW were calculated in sample using p = 10 lags. (As 
noted previously, 10 equality in (41) is only assured in the limit of infinite lags; 10 lags was found 
empirically to achieve good accuracy without overfitting the data.) Relative errors of integrated 
spectral MVGC with respect to time-domain MVGC, expressed as a percentage, were defined as 

f - mn v ^ 5: /y ^ x(A) dA ~ Ty ^ x 
E% = 100 x , 

J~y^>x 

E% ee 100 x — J - /f%X i A) dA " , (43) 

Jy-tX 

for MVGC and trvMVGC respectively. (The integrals were computed by standard numerical quadra- 
ture.) Results, displayed in Table 1, confirm to good accuracy the theoretical prediction of Eq. (41) 
for MVGC (the small negative bias on E% is due to the finite number of lags), while for trvMVGC 
relative errors are several orders of magnitude larger and furthermore were not decreased by choosing 
longer stationary sequences and/or more lags. The full distribution of relative errors is also displayed 
as a histogram in Fig. 2. 



10 A subtlety to note is that even if the MVAR (30) has a finite number of lags p,q < oo, the exact restricted regression 
of X on its own past will generally require an infinite number of lags [5]. Thus in theory, for exact equality in (41), an 
infinite number of lags is required to calculate the term T,(X \ X.) which appears in J-y^x (using a finite number of 
lags will generally result in an overestimate of -7-V_>x, since residual errors will be larger than for the exact regression). 
As applied to empirical data, it is in any case good practice to choose "sufficient" lags for all regressions so as to model 
the data adequately without overfitting [27, 28]. 
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error 


mean 


std. dev. 


abs. mean 


E % 


-0.0004 


0.0005 


0.0005 




-0.0488 


10.5995 


8.1799 



Table 1: Comparison of relative errors of integrated spectral MVGC and trvMVGC with respect to 
time domain MVGC and trvMVGC, for a random sample of MVAR(l) processes. Top row shows 
MVGC, bottom row shows trvMVGC. See text for details. Figures in the "abs. mean" column are 
the means of the absolute values \E%\ and \Ey\. 
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Figure 2: Distribution of relative errors of integrated spectral multivariate G-causality with respect 
to the time domain for (a) MVGC (b) trvMVGC, for a random sample of MVAR(l) processes. 
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Figure 3: Comparison of MVGC and trvMVGC in the frequency domain: spectral MVGC and 
trvMVGC plotted against frequency for (a) a typical MVAR(3) process with dim(X) = 2, dim(V) = 
1 and (b) a typical MVAR(5) process with dim(X) = 3, dim(l^) = 2. 



We also repeated the experiment with higher order MVAR(p) processes, higher dimensional 
predictee and predictor variables and correlated residuals e x . In all cases, results confirmed the 
accuracy of (41) for MVGC and yielded large relative errors for trvMVGC. We remark that qualitative 
differences (i.e. aside from differences of scale) between spectral MVGC and trvMVGC could be 
substantial (Fig. 3). These differences, furthermore, appeared in general to be exaggerated by the 
presence of residual correlations; this is consonant with the sensitivity of MVGC as contrasted with 
the lack of sensitivity of trvMVGC to residual correlations (see Sections 4.3 and 5). 

It is straightforward to show that /v-s>x(A) is invariant under the same group of linear transfor- 
mations (17) as J-y->x] again, fy^x(^) wm m g enera l be invariant only under the restricted group 
with T xx conformal; this extends to the conditional case. 



5 Multivariate partial Granger causality 

Recently, a partial G- causality measure has been introduced [29] which exploits a parallel with the 
concept of partial coherence [30] in order to control for latent / exogenous influences on standard G- 
causality. Partial G-causality modifies the standard G-causality measure by including terms based 
on residual correlations between the predictee variable and the conditional variables. Consider, in 
addition to the regressions (7), the following regressions of the conditioning variable Zt- 

Z t = B- © z£\) + vt , 

V ' (44) 

Z t = B'-(x^ l& Y t ^ l@ Z ( [\)+r 1 ' t . 

Here the roles of the predictee and conditioning variables are reversed. Then for univariate predictor 
and predictee the partial G-causality of Y on X given Z is defined by conditioning the respective 
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residual co-variances for the regressions of X on the corresponding residuals for the regressions of Z: 



F ™ i- = 1ii IeRK)J' <45) 

This extends naturally to the fully multivariate case (c.f. Eq. (10)), and we define partial MVGC 
(pMVGC) as 

TP -1 ( \^ £t \Vt)\ \ (AR\ 

/ \E(x\x-ez-ez)\ \ 

\\z{x\x- @y- ®z- @z)\) y 1 

where the RHS (47) follows from the identity (64) derived in Appendix C, (with W = X~ © Z~ 
and W = X~ © Y~ © Z~ for the numerator and denominator terms respectively) . Comparing with 
(10) we see thus that pMVGC differs from MVGC in the inclusion of the present of the conditioning 
variable Z in the respective regressions. Seen in this form, it is clear that, as is the case for MVGC, 
pMVGC is always non- negative. 11 One could alternatively express pMVGC as (non-partial) MVGC 
conditioned on a "forward lagged" version of Z: defining Zt = Zt+\ we have Zt © Z\_ x = Z^_ 1 , or 
Z = Z © Z~ (note the additional lag on Z~), so that, from Eq. (47), 

F Y^X \Z = F y-^X \Z ■ (^) 

As noted in Section 4.3, (non-partial) MVGC to some extent already controls for the influence 
of latent /exogenous variables because the generalized variance is sensitive to residual correlations. 
However, pMVGC takes into account even more correlations with the explicit aim of controlling 
for latent /exogenous influences. pMVGC may therefore be preferable when such influences are ex- 
pected to be (a) strong and (b) relatively uniform in their influence on the measured system. Indeed, 
pMVGC (and the original measure of partial G-causality) can only be effective in compensating for 
latent/exogenous variables that affect all modeled variables (i.e. predictee, predictor and condition- 
ing) to a roughly equal degree [29]. 

It is interesting to note that pMVGC may be expressed in terms of non-partial MVGCs as 

^Y^x \z = ^Y^zisx ~ Fy^z \x ■ (49) 

by straightforward application of Eq. (3). As expected, (49) includes a term with a mandatory 
multivariate predictee, since it is only in this case that residual correlation can make a difference. 
It is interesting that Z appears as a predictee variable; this might be understood as pMVGC using 
the conditioning variable Z as a "proxy" by which to assess the influence of latent or exogenous 
variables. 

A "trace" version of pMVGC may be defined analogously to (46). Again by Eq. (64) of Appendix 
C, the identity corresponding to (47) will hold, as will the trace analogue of (48). However, the 



11 In [29] it is stated that partial G-causality may in some circumstances be negative; the justification for this is 
unclear. 
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analogue of (49) will not hold in general, since the traces of the partial covariance matrices will in 
general not factorize appropriately. 12 

From (48) it is straightforward to derive a spectral decomposition fy-+x |z(^) ^ or pMVGC, which 
will integrate correctly to the time-domain pMVGC J~y^,x \z- Again, a spectral decomposition for 
the corresponding trace version is likely to be problematic, insofar as it will fail in general to integrate 
correctly to the time-domain value (c.f. Section 4.4). 



6 Causal density 

A straightforward application of MVGC is to measures of causal density, the overall level of causal 
interactivity sustained by a multivariate system X. A previous measure of causal density [22] has 
been defined as the average of all pairwise (and hence univariate) G-causalities between system 
elements, conditioned on the remaining system elements: 13 

where Xuji denotes the subsystem of X with variables Xi and Xj omitted, and n is the total number 
of variables. Causal density provides a useful measure of the dynamical "complexity" of a system 
inasmuch as elements that are completely independent will have zero causal density, as will elements 
that are completely integrated in their dynamics. Exemplifying standard intuitions about complexity 
[31], high causal density will only be achieved when elements behave somewhat differently from each 
other, in order to contribute novel potential predictive information, and at the same time are globally 
integrated, so that the potential predictive information is in fact useful [32, 33]. 

Using MVGC, various extensions to (50) can be suggested, based on the various possible inter- 
actions between multivariate predictors, predictees and conditional variables. These extensions may 
provide a more principled measure of complexity by analyzing a target system at multiple scales. 
First we define the causal density from size k to size r, cd k -> r (X), as the average MVGC from a 
subset of size Ho a subset of size r, conditioned on the rest of the system: 



cdfc^. r (X) = XT^ 7 v k ^u r \w n ~ k ~ r ' (51) 
1 i=i 

where X = V t k U VJ U W^~ k ~ r denotes the i th of the n k>r = (£) ( n ~ k ) distinct tripartitions of X 
into disjoint sub-systems of respective sizes k, r and (n — k — r). Then using this, one could define 
the bipartition causal density (bed) as the average of cdi c _ j .^ n _^(X) over predictor size k, 

n—l 

bcd(X) = —j £ cd k ^ [n _ k) (X) . (52) 

k=l 



12 In [4], under the section headed "Partial Complex Granger causality", the quantity developed appears to be 
(the trace version of) what is conventionally referred to as conditional G-causality, rather than partial G-causality as 
introduced in [29] and referenced in this section. 

13 This is the "weighted" version of causal density. An unweighted and [0,1] bounded alternative can be defined as 
the fraction of all pairwise conditional causalities that are statistically significant at a given significance level. 
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Interestingly, this quantity is closely related to the popular Tononi-Sporns-Edelman "neural com- 
plexity" measure [34] which averages (contemporaneous) mutual information across bipartitions; (we 
are currently exploring this in work in preparation). It could also be interesting to compare causal 
density at different scales of predictor plus predictee size; thus we define 

1 5-1 

cd s (X) = —— ^ c dt_>(s-fc) (X) . (53) 

k=l 

Then the original causal density measure of Eq. (50) is just cd2 and bed is cd n . The average of this 
over all scales can be used to define a complete tripartition causal density (ted): 

1 n 

tcd(X) = V cd s (X) . (54) 

n — 1 ^-^ 

s=2 

A comparison of the properties of all versions of causal density, as well as related complexity mea- 
sures, is in progress. We remark that it is straightforward to define spectral versions of these causal 
density measures. 



7 Autonomy in complex systems 

G-causality has recently been adapted to provide an operational measure of "autonomy" in complex 
systems [11]. A variable X can be said to be "G-autonomous" with respect to a (multivariate) set 
of external variables Z if its own past states help predict its future states over and above predictions 
based on Z. This definition rests on the intuition of autonomy as "self determination" or "self 
causation" . We can formalize this notion along the lines of MVGC as follows. Consider the regressions 

X t = A-z£} 1 +e t , 

X t = A' ■ (xffl Z«) + e' t , 



which differ from Eqs. (7) primarily because the predictee variable X is not regressed on itself in 
one of the equations. The G-autonomy of X is then given by 

(56) 



The extension of G-autonomy to the multivariate case is important because it accommodates situa- 
tions in which groups of elements may be jointly autonomous (self-determining, self-causing), even 
though the activity of individual elements within the group may be adequately predicted by com- 
binations of activities of other elements in the group. Univariate formulations of G-autonomy [II] 
would fail in these cases. Consider as a trivial example an element X\ which is G-autonomous with 
respect to a background Z. If X\ is now duplicated by the element X2 it will no longer appear 
as G-autonomous within the multivariate system X\®Xi®Z . However, the multivariate variable 
X\®X2 will be (jointly) G-autonomous with respect to Z. 
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As discussed in [11] G-autonomy also provides the basis for a notion of "G-emergence" as ap- 
plied to the relation between macroscopic variables "emerging" from the activity of microscopic 
constituents. G-emergence operationalizes the intuition that a macro-level variable is emergent to 
the extent that it is simultaneously autonomous from and dependent upon its micro-level constituents 
[11, 35]. Extension of G-emergence to the multivariate case using MVGC is straightforward, allowing 
consideration of multivariate micro- and macro- variables. 

8 Macroscopic variables and causal independence 

Given the ability to assess multivariate causal interactions, a second challenge arises: the identifica- 
tion of relevant groupings of variables into multivariate ensembles. One approach to this challenge 
adopts the perspective of statistical mechanics on the emergence of novel macroscopic variables, given 
a microscopic description of a system [36, 37]. Here, we suggest that MVGC may furnish a useful 
method for macro- variable identification in this context. Let us assume that Z% represents a set of 
microscopic variables defining a complex (possibly stochastic) dynamical system, and Xt = f(Zt) a 
set of macroscopic variables functionally (possibly deterministically) dependent on the microscopic 
variables. There is then a sense in which X represents a "parsimonious" high-level description of the 
system, to the extent that it predicts its own dynamical evolution without recourse to the low level 
of description of the system represented by Z\ that is, to the extent that X exhibits strong causal 
independence with respect to Z. In this view, J~z->x furnishes a natural measure of the lack of 
this causal independence, which might then be used to identify parsimonious macroscopic variables 
by minimizing J : z~>f{Z) over candidate functions /(•)■ The multivariate formulation MVGC would 
appear to be significant in this context for reasons similar to the G-autonomy case. Specifically, it 
may be that a set of macroscopic variables X may jointly have high causal independence with respect 
to the microscopic variables Z, while the component variables Xj may individually have lower causal 
independence. 

The notions of G-autonomy, G-emergence, and causal independence are distinct but related. In 
short, G-autonomy measures "self-causation", causal independence measures the absence of use- 
ful predictive information between microscopic and macroscopic descriptions of a system, and G- 
emergence measures a combination of macro-level autonomy and micro-to-macro causal dependence. 
It is possible, and is left as an objective of future work, that all three measures could be applied 
usefully to systems that avail multiple levels of descriptions, (i) to identify relevant groupings of 
observables at each level, (ii) to decompose causal interactions within each level, and finally (iii) to 
quantitatively characterize inter-level relationships. 

9 Discussion 

We have described and motivated a measure of multivariate causal interaction that is a natural ex- 
tension of the standard G-causality measure. The measure, originally introduced by Geweke [5] but 
almost totally overlooked since, uses the generalized variance (the determinant of the residual co- 
variance matrix) and we have termed it multivariate G-causality (MVGC). It contrasts with another 
recent proposal [4] for addressing the same problem which uses instead the total variance (the trace 
of the residual covariance matrix). In this paper, we have presented several theoretical justifications, 
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augmented by numerical modeling, for preferring MVGC over the trace version, which we summarize 
below. We have also extended MVGC to address novel challenges in the analysis of complex dy- 
namical systems, including quantitative characterization of "causal density", "autonomy", and the 
identification of novel macroscopic variables via causal independence. 

9.1 Importance of multivariate causal analysis 

In many analyses of complex systems, particularly in neuroscience and biology, there may be no sim- 
ple or principled relationship between observed variables and explanatorily relevant collections, or 
ensembles, of these variables. In the Introduction we already remarked on fMRI, where explanatorily 
relevant ROIs are each composed of multiple observables (voxels) which are arbitrarily demarcated 
with respect to underlying neural mechanisms. Other non-invasive neuroimaging methods share 
similar varieties of arbitrariness: both electroencephalography (EEG) and magnetoencephalography 
(MEG) provide signals which are complex convolutions of underlying neural sources. In these and 
similar cases, multivariate causal analysis, and MVGC in particular, can be used to aggregate univari- 
ate observables into meaningful multivariate (ensemble) variables. It bears emphasizing that MVGC 
is fundamentally different from conditional G-causality [38], which assesses the causal connectivity 
between two univariate variables, conditioned on a set of other variables. 

Even when it is possible to measure directly the activity of variables of interest, it is still important 
to consider multivariate interactions. Continuing with the neuroscience example, it may be that 
multiple ROIs act jointly to influence other ROIs, or cognitive and/or behavioral outputs. In single 
cell recordings this point is even more pressing: since Hebb [39] it has been increasingly appreciated 
that neurons act as ensembles, rather than singly, in the adaptive function of the brain [40]. MVGC 
is well suited to disclosing causal relationships among these ensembles as a window onto underlying 
principles of brain operation. 

Of course, the application of MVGC is not limited to neuroscience. Multivariate interactions are 
likely to be important in a very broad range of application areas. For example, genetic, metabolic, 
and transcriptional regulatory networks may be usefully decomposed into multivariate ensembles 
influencing other such ensembles [4]. Indeed, multivariate interactions may be important in any 
system, natural or artificial, which can be described in terms of multiple simultaneously acquired 
time series. 

9.2 Generalized variance vs total variance 

A different approach to multivariate causal analysis was recently proposed by Ladroue and colleagues 
[ ] . This involved a measure (which we call trvMVGC) based on the trace of the residual covariance 
matrix (the total variance), rather than the determinant (the generalized variance). Geweke [5] pro- 
vided the original justifications for the determinant form, but did not explicitly discuss the trace form. 
As noted in Section 3 of Ref. [5], Geweke's motivations included (i) MVGC is invariant under (linear) 
transformations of variables, and (ii) the maximum likelihood estimator of MVGC is asymptotically 
X 2 -distributed for large samples; (there is no standard test statistic for trvMVGC). In this paper we 
have substantially enhanced this list, in each case comparing MVGC explicitly with trvMVGC. In 
summary: (iii) MVGC is fully equivalent to transfer entropy under Gaussian assumptions, whereas 
for trvMVGC this equivalence only holds for the univariate case; (iv) MVGC is invariant under 
all (non-singular) linear transformations of the predictee variable, while trvMVGC is invariant only 
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under conformal linear transformations (see below); (v) only MVGC is expandable as a sum of uni- 
variate G-causalities; (vi) MVGC but not trvMVGC admits a satisfactory spectral decomposition, 
inasmuch as it guarantees a consistent relationship with the corresponding time-domain formulation; 
(vii) only MVGC depends on residual correlations, and through these accommodates in a natural 
way the influence of exogenous or latent variables, and (viii) the partial version of MVGC, pMVGC 
is decomposable in terms of non-partial MVGCs, but this is not true in general for trvMVGC. 

All the above factors suggest that MVGC should be preferred to trvMVGC. Taken individu- 
ally they may differ in their significance but taken together they emphasize that MVGC, but not 
trvMVGC, provides a comprehensive and theoretically consistent extension of standard G-causality 
to the multivariate case. While this consistency is the most important reason to prefer MVGC to 
trvMVGC, let us consider further three of the individual properties. First, the equivalence with 
transfer entropy is important because it justifies the use of linear modeling for multivariate causal 
analysis, at least where Gaussian assumptions are reasonable. Second, the broader range of invariance 
is important because it means that MVGC is robust to a wider range of common inaccuracies during 
data collection, in particular those in which univariate variables are contaminated by contributions 
from other variables and in which different components of multivariate ensembles are differently 
scaled by measurement constraints. It is likely that this additional robustness will have significant 
practical importance in many experimental applications, for example in EEG and MEG where in- 
dividual sensors detect signals from multiple neural sources and may differentially amplify these 
sources according to their distance from the sensors and their alignment with the cortical surface. 
Finally, the lack of a satisfactory spectral version of trvMVGC, which we establish both theoretically 
and numerically (Section 4.4 and Figures 2 and 3), implies that frequency-domain results obtained 
using trvMVGC are unreliable, both in their magnitude and in their spectral profile. 

Ladroue et al. [4] note Geweke's form (i.e. MVGC) and suggest trvMVGC is preferable in view 
of possible numerical instabilities attending the computation of determinants for high-dimensional 
data. However the existence of an expansion of MVGC in terms of univariate G-causalities (18) 
seems to counter this claim, since the univariate causalities would not be expected to be unstable. 
Numerical simulations (Section 4.3.1 and Figure 1) confirm our view. 

9.3 Quantities derived from MVGC 

In the second part of the paper we used MVGC to derive several novel measures that have the 
potential to shed substantial new light on complex system dynamics. 

First, MVGC leads immediately to a series of redefinitions of our previous "causal density" 
measure [22], which aims to capture the dynamical complexity of a system's dynamics in terms of 
coexisting integration and differentiation. Extension to the multivariate case allows causal density to 
be evaluated at multiple levels of description thus furnishing a more principled measure of dynamical 
complexity. Causal density has been suggested as a measure of neural dynamics that captures 
certain aspects of consciousness [32]. It has been shown 14 to increase in response to perceived stimuli 
as compared to non-perceived stimuli in a visual masking task [41], and it captures the complex 
dynamics of small-world networks more effectively than does a prominent competing measure, neural 
complexity [33]. Multivariate causal density has the potential to further strengthen and generalize 
these contributions. 

14 In approximation. 
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Second, MVGC can be used to generalize the concept of G-autonomy, which operationalizes the 
notion of autonomy as "self causation" [11]. Multivariate G-autonomy is a significant enhancement 
because it deals with the case in which a group of variables may be jointly autonomous even though, 
individually, no variable is autonomous. Our results therefore pave the way to informative application 
of this measure to complex systems. 

Third, MVGC can be helpful in considering relations between microscopic and macroscopic levels 
of description of a system. One approach is to consider how causally independent a macroscopic 
variable is, with respect to its set of constituent micro- variables. We have suggested that this notion 
can be used to identify parsimonious macro-variables by maximizing causal independence over a 
space of functions relating micro- and macro- variables. Alternatively, the concept of G-emergence 
operationalizes the idea that an emergent macro-variable is both autonomous from and causally 
dependent on its underlying micro-level constituents. Unlike the "causal independence" view, G- 
emergence may be better suited to characterizing the degree of emergence as opposed to identifying 
prospective macro- variables; G-emergence also explicitly measures micro-to-macro causal dependence 
rather than assuming that it is present. 

Finally, the concepts of redundancy and synergy amongst variables have been recently intro- 
duced, via the use of a variant of the trvMVGC measure [42]. These quantities aim at detecting 
functionally relevant partitions of a system by grouping variables according to their summed causal 
influences. Because of the advantages of MVGC over trvMVGC, we suggest it may be useful to 
redefine redundancy and synergy in terms of MVGC. 

9.4 Summary 

Models of complex systems typically contain large numbers of variables. Having a measure for 
directed interactions between groups of variables, as opposed to just single variables, provides a useful 
tool for the analysis of such systems. We have demonstrated that MVGC is such a measure, and we 
have provided a series of justifications, theoretical and numerical, to prefer it over a related measure, 
trvMVGC. Like all measures of directed interaction based on G-causality, MVGC can be measured 
for freely collected data, without perturbing or providing inputs to the system. Finally, in contrast 
to alternative approaches such as structural equation modeling [43] or dynamic causal modeling [2], 
MVGC can be applied with very little prior knowledge of the system under consideration. 
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Appendix 

A Minimizing the determinant of the residuals covariance matrix 

We wish to show that minimizing the determinant |S(e)|, where e = X — A ■ Y as specified in (4), 
leads to the same values (5) for the regression coefficients A. We thus solve for A in the simultaneous 
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equations 



me) 



0, 



where % runs from 1 . . . n, a from 1 . . . m and S(e) is given by 

S(e) = S(X) - S(X,F)^ T Y) T +A£(Y),4 T . 

We use the formula for an invertible square matrix B 
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Assuming S(e) invertible and setting W = |S(e)| 1 we have 
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after gathering terms and simplifying, and Eq. (5) follows. 
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B Proof of expansion of multivariate Granger causality 

Here we prove Eq. (18). We consider the case of there being no conditional third variable, since the 
extension to this case is trivial. We first expand in terms of predictor variables according to 
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= Fy^x + Fy^x^ +J r Y 3 ^x\Yi®Y 2 H 1 - •7 r y m ->x|yi©Y 2 ®...©Y m _i • (60) 

To expand in terms of predictees we use the expansion 

|E(X|VT)| = E(X 1 )E(X 2 | W©Xi)E(X 3 |W©X 1 ©X 2 )---E(X n | W©Xi©...X n _!) , (61) 
which follows from repeated application of Eq. (3). We obtain 
|E(X|X-)| \ 



?y x ^x = log 



loe 



|E(X|X-©Yf)|y 
E(X I X-) E(X 2 I X- © Xi) ■ ■ ■ E(X n I X- © X 1 © X 2 © . . . © X n _i) 



£(X I X- © Yf ) E(X 2 I X- © Yf © Xx) • • -E(X n I X- © YT ffiXi ©X 2 © . . . © X n _i) ^ 
= -^Yi->Xi |x + Fy 1 ^x 2 \x®x° + -^yi^Xg ixexfexo H ^ -^y^x™ |x©x°©A-°e...®x£_ 1 > ( 62 ) 
and similar for the other components of the sum in Eq. (60), from which the result follows. 

C Partial covariance of residuals for two variables jointly depen- 
dent on a third 

Given the regressions 

X = A- W + e, 

Z = B W + rj, (63) 

where the regression coefficients A, B are derived from an ordinary least squares, Yule- Walker or 
equivalent procedure, we show that 

E(e\rj) = S(X|Zffi W) , (64) 
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assuming that all (partial) covariance matrices which appear below are invertible. We have 

E( £ ) = E(X|W) , 

E(r/) = E(Z|W), (65) 
E(e,r 1 ) = E(X,Z\W) . 

Thus we may calculate that 

S(e|»7) = E(X|VT)-E(X,Z| W) E(Z | W) -1 E(Z, X | W) . (66) 

Using the block matrix inversion formula for E(Z © W), we may also calculate that 

E(X|Z©VT) = E(X)-E(X,Z| W)E(Z| W^ 1 E(Z, X)-E(X, W | Z) E( W | Z)" 1 E(W, X) . 

(67) 

Now expanding the E(X | W) = E(X) - E(X, W) E(W) _1 E(W, X) term in (66), we find using 
(67) that (64) is equivalent to 

E(X, W) ^(Wy 1 E(W, X) + E(X, Z | W) E(Z | W) _1 E(Z, X | W) 

= E(X, Z | W) E(Z | WT 1 E(Z, X) + E(X, W | Z) E( W \ zy l E(W, X) . 

Or, rearranging and factorizing, 

E(X,VT)E(Vr) _1 -E(X,W|Z)E(W|Z) _1 j E(W,X) 

= E(X,Z|VT)E(Z|W)^ 1 [E(Z,X)-E(Z,X|W)] . (68) 

Now the term in square brackets on the RHS of (68) simplifies to E(Z, W) E(VF) -1 E(W, X) so 
that, factoring out E(W,X), (68) is equivalent to 

e(x, w) E(wy l -e(x,w|z)e(w| zy 1 - s(x, z | w) e(z i wy 1 e(z, w) e(w) -1 

xE(W,X) = 0. (69) 

We now show that the term in the square brackets in (69) is zero; i.e. that 

E(X, W) E(VT)" 1 - E(X, W | Z) E( W | Z)" 1 - E(X, Z | W) E(Z | W)' x E(Z, W) E(W)- 1 = , 

(70) 

thus proving (64). Rearranging and factoring out Ti(W) , (70) becomes 

E(X,VT)-E(X,Z|VT)E(Z|Vr)^ 1 E(Z,VF)j E(VT) -1 = E(X, W \ Z) E( W \ Z)~ l , 
or, multiplying through on the right by E( W | Z), 

E(X, W) - E(X,Z | W) E(Z | W)" 1 E(Z, W)l S(W)" 1 E( W \ Z) = E(X, W \ Z) . 
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Expanding E(W r | Z), factorizing and rearranging again, we get 
£(X, Z) - E(X, W) Z(Wy l E(W, Z)l E(Z)" 1 £(Z, W) 

= z(x,z\w)z{z\wy 1 z(z,w)z{wy 1 z{w\z) , 

or, since the term in square brackets on the LHS is just £( X, Z \ W), 

E(X,Z\ W) hizy 1 £(Z, W) - E(Z | W)" 1 £(Z, W) E(W) _1 E(W | Z) . 
We now show that, again, the term in square brackets is zero; i.e. that 

e(z) -1 e(z, w) = s(z | wy 1 s(z, w) ^{wy 1 s( w | z) . (71) 

Multiplying through on the left by E(Z | VF), (71) is equivalent to 

s(z | w) ^{zy 1 e(z, w) = s(z, w) ^(wy 1 e( w | z) , 

which follows immediately on expanding E(Z | W) and E(VF | Z), thus establishing (64). 
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